AITopics | noiseless linear model

A Linear regression with Gaussian features

Neural Information Processing SystemsFeb-7-2026, 16:24:29 GMT

In the setting of Section 2.1, we assume The proof is based on the following lemma, that we state clearly for another use below. Lemma 2. Let θ H . Then for all β This lemma follows from Hölder's inequality with Applying Hölder's inequality, we get E We start with a few preliminary remarks. By summing for k = 1,...,n and using the bound (17), ϕ We continue the proof of Theorem 1 to prove Theorem 3. By the log-convexity Property 1, for all This proves conclusion 1 of the theorem. Both terms of the equality can be infinite: here we are using the convention stated in Section 2.1 that We can assume that (a) is satisfied, i.e., Thus the theorem below extends Theorem 5. Theorem 6. This theorem is proved at the end of this section.

artificial intelligence, machine learning, nullnull, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.40)

Add feedback

1b33d16fc562464579b7199ca3114982-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 16:24:23 GMT

artificial intelligence, convergence rate, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)

Add feedback

Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

Neural Information Processing SystemsDec-23-2025, 19:47:33 GMT

In the context of statistical supervised learning, the noiseless linear model assumes that there exists a deterministic linear relation $Y = \langle \theta_*, \Phi(U) \rangle$ between the random output $Y$ and the random feature vector $\Phi(U)$, a potentially non-linear transformation of the inputs~$U$. We analyze the convergence of single-pass, fixed step-size stochastic gradient descent on the least-square risk under this model. The convergence of the iterates to the optimum $\theta_*$ and the decay of the generalization error follow polynomial convergence rates with exponents that both depend on the regularities of the optimum $\theta_*$ and of the feature vectors $\Phi(U)$. We interpret our result in the reproducing kernel Hilbert space framework. As a special case, we analyze an online algorithm for estimating a real function on the unit hypercube from the noiseless observation of its value at randomly sampled points; the convergence depends on the Sobolev smoothness of the function and of a chosen kernel. Finally, we apply our analysis beyond the supervised learning setting to obtain convergence rates for the averaging process (a.k.a.

name change, stochastic gradient descent, tight nonparametric convergence rate, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)

Add feedback

1b33d16fc562464579b7199ca3114982-Supplemental.pdf

Neural Information Processing SystemsOct-2-2025, 08:09:13 GMT

artificial intelligence, machine learning, nullnull, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

1b33d16fc562464579b7199ca3114982-Paper.pdf

Neural Information Processing SystemsOct-2-2025, 08:09:06 GMT

artificial intelligence, convergence rate, machine learning, (16 more...)

Neural Information Processing Systems

Country: Europe > France (0.28)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)

Add feedback

Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

Neural Information Processing SystemsMay-26-2025, 17:27:45 GMT

In the context of statistical supervised learning, the noiseless linear model assumes that there exists a deterministic linear relation Y \langle \theta_*, \Phi(U) \rangle between the random output Y and the random feature vector \Phi(U), a potentially non-linear transformation of the inputs U . We analyze the convergence of single-pass, fixed step-size stochastic gradient descent on the least-square risk under this model. The convergence of the iterates to the optimum \theta_* and the decay of the generalization error follow polynomial convergence rates with exponents that both depend on the regularities of the optimum \theta_* and of the feature vectors \Phi(U) . We interpret our result in the reproducing kernel Hilbert space framework. As a special case, we analyze an online algorithm for estimating a real function on the unit hypercube from the noiseless observation of its value at randomly sampled points; the convergence depends on the Sobolev smoothness of the function and of a chosen kernel.

artificial intelligence, machine learning, tight nonparametric convergence rate, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Review for NeurIPS paper: Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

Neural Information Processing SystemsJan-22-2025, 06:22:48 GMT

Weaknesses: While the paper is written very clearly, there are several questions I'd like to raise. Firstly, in discussing the applicability of the results the paper mentions'some basic vision or sound recognition tasks' (line 33) - I'd like to ask about some examples of such tasks. Looking at the statement of the Theorem 1, seems that it should be applicable in finite-dimensional spaces with invertible covariance matrices. If it is so, then I do not understand the results. In particular, for X distributed with a finite support and has identity covariance matrix, the conditions (a) and (b) hold for arbitrarily large positive \alpha, however the theorem statement implies that the estimates will go to zero at an arbitrarily large polynomial rate, which is not true.

noiseless linear model, stochastic gradient descent, tight nonparametric convergence rate, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)

Add feedback

Review for NeurIPS paper: Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

Neural Information Processing SystemsJan-22-2025, 06:22:40 GMT

The submission considers noiseless (/low noise) linear regression with non-linear transformation of the input data, and show that under this setting, SGD achieves faster convergence rates. This is a very nice contribution with applicability to important problems as mentioned by the authors in their feedback. We urge the authors to incorporate the points they made in response to the reviews.

artificial intelligence, machine learning, tight nonparametric convergence rate, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)

Add feedback

Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

Neural Information Processing SystemsOct-9-2024, 16:09:06 GMT

In the context of statistical supervised learning, the noiseless linear model assumes that there exists a deterministic linear relation Y \langle \theta_*, \Phi(U) \rangle between the random output Y and the random feature vector \Phi(U), a potentially non-linear transformation of the inputs U . We analyze the convergence of single-pass, fixed step-size stochastic gradient descent on the least-square risk under this model. The convergence of the iterates to the optimum \theta_* and the decay of the generalization error follow polynomial convergence rates with exponents that both depend on the regularities of the optimum \theta_* and of the feature vectors \Phi(U) . We interpret our result in the reproducing kernel Hilbert space framework. As a special case, we analyze an online algorithm for estimating a real function on the unit hypercube from the noiseless observation of its value at randomly sampled points; the convergence depends on the Sobolev smoothness of the function and of a chosen kernel.

noiseless linear model, stochastic gradient descent, tight nonparametric convergence rate, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

Berthier, Raphaël, Bach, Francis, Gaillard, Pierre

arXiv.org Machine LearningOct-27-2020

In the context of statistical supervised learning, the noiseless linear model assumes that there exists a deterministic linear relation $Y = \langle \theta_*, X \rangle$ between the random output $Y$ and the random feature vector $\Phi(U)$, a potentially non-linear transformation of the inputs $U$. We analyze the convergence of single-pass, fixed step-size stochastic gradient descent on the least-square risk under this model. The convergence of the iterates to the optimum $\theta_*$ and the decay of the generalization error follow polynomial convergence rates with exponents that both depend on the regularities of the optimum $\theta_*$ and of the feature vectors $\Phi(u)$. We interpret our result in the reproducing kernel Hilbert space framework. As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points; the convergence depends on the Sobolev smoothness of the function and of a chosen kernel. Finally, we apply our analysis beyond the supervised learning setting to obtain convergence rates for the averaging process (a.k.a. gossip algorithm) on a graph depending on its spectral dimension.

artificial intelligence, assumption, machine learning, (17 more...)

arXiv.org Machine Learning

2006.08212

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Filters

Collaborating Authors

noiseless linear model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

A Linear regression with Gaussian features

1b33d16fc562464579b7199ca3114982-Paper.pdf

Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

1b33d16fc562464579b7199ca3114982-Supplemental.pdf

1b33d16fc562464579b7199ca3114982-Paper.pdf

Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

Review for NeurIPS paper: Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

Review for NeurIPS paper: Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model